1 If you provide definitions, give a citation

2 We did not investigate the source of polymorphism, but the processes affecting its spatial distribution. Similarly we certainly cannot discern the CAUSE of selection with our studies- we might obtain evidence for its presence.

3 A 20m or 40m separation between populations might be sufficient to mean that the allele frequencies were not correlated due to gene flow (or close enough to show traces of gene flow from different habitats) - but to be sure, we would have to investigate that in our study, since we do not KNOW the relevant distances a priori. You need to be careful to distinguish assumptions which may be wrong, and the evidence you have collected.

4 The terms 'Figure 1', 'Figure 2' etc are for FIGURES pictures/graphs/maps etc. Do not use this term for tables, which (should be called 'Table 1', 'Table 2' etc).

5 When you find a difference between what is reported in some of the literature and your results, have confidence in your own results. Do not be too hasty to attempt to explain it away as experimental error.

6 Chi square tests evaluate the difference in relative frequencies between two or more populations. Hence you should report which populations are being compared, and what the categories that were being compared (e.g. populations 2 vs 3, the frequency of brown, pink, yellow unbanded, & yellow banded). It would be easier to understand the results if these were reported as a contingency table (example below), or shown graphically.

7 In order to evaluate a chi square value, you need to know the degrees of freedom

8 It is not obvious that having sample areas of different size in different localities would cause problems. If you think it would, then you need to explain why

9 If you see no significant differences, there might be biological reasons (e.g. extensive gene flow`)

10 Prof Steve Jones did not ONLY study Pyrenean snails. You know he did it in Oxfordshire too (from the videos).

11 Do not neglect drift as a process producing differences between populations

12 There were no C. hortensis on the site, only young C. nemoralis with no lip-band.

13 If you want to do a contingency table test in R you need to produce a matrix, e.g. using cbind:

  • Grass1<-c(brown=0,pink=2,yellow=3,yellow_banded=6)
  • Grass2<-c(brown=4,pink=5,yellow=2,yellow_banded=0)
  • contingency_tab<-cbind(pop1,pop2)
  • contingency_tab
pop1 pop2
brown 0 4
pink 2 5
yellow 3 2
yellow_banded 6 0

  • chisq.test(contingency_tab)
Pearson's Chi-squared test
data: contingency_tab
X-squared = 11.4857, df = 3, p-value = 0.00937

unfortunately
  • data<-pop1
  • data<-pop2
  • chisq.test(data)
overwrites data with pop2 and then tests if all the categories in pop2 are equal.

14 If you use R correctly, then it gives you the p-value straight away, so you don't need a Chi Square table of critical values. (The values in such tables are Chi-Squared values, allowing you to look up p-values)

15 You should compare all categories (e.g. colours) between two sites in one chi square test. Picking and choosing creates more tests, and hence the chance of false-positives (type I error: http://en.wikipedia.org/wiki/Type_I_and_type_II_errors)

15a Citations in-text should not be in full. Also if you use the Name+date convention, you don't need to use numbered superscripts as well. See www.citethisforme.com/harvard

16 Alleles are not broken up by recombination, they are swapped between genetic backgrounds.

17 Do not use quotes for long passages you could put into your own words

18 As we covered in great length in the lectures - the p value does not indicate if the null hypothesis is true, or that the results are due to chance, and does not give you direct evidence about the alternative hypothesis at all. Also do not use p-values to indicate the size of an effect - because p-values can be small because of large differences in frequency, or large sample sizes or some combination.

19 It is a major error to infer that you have established an absence of selection if you have a p-value greater than 0.05. That could simply mean your sample size was too small to detect any effect, for example.

20 Pseudo-replication is not beyond your control. You can avoid it with astute experimental design

21 Population bottlenecks and range expansion can both cause severe genetic drift, but the are not different - they are events that cause episodes of drift.

22 Locus is singular, loci is plural.

23 Text in quotes should attribute a source. "" for spoken, '' for written sources.

24 If you plot allele frequencies from populations, it makes sense to have the populations in geographic sequence on the X-axis, not alphabetical

25 If you find no significant differences among populations it does NOT suggest that drift is in action. Similarly significant differences are not necessarily caused by selection - drift CAN cause significant differences!

26 Selection and drift are present in all populations, it is their magnitude that is in question

27 experimental designs with 3 habitats have limited replication within habitat

28 Chi squared tests on contingency tables evaluate the relative frequencies of each category in each population, not the number (# means number, and is not really appropriate in text)

29 If you reject the null (at the 5% level, say) then you are rejecting the hypothesis of equal frequencies - not a difference between 'environment and phenotype', nor a hypothesis of different frequencies.

30 If you find a significant difference among populations you need to find some logic that suggests whether it is due to drift or selection.

31 If you say 'y is understood since ...x' the phrase x should explain why y is understood.

32 Selection on one trait (such as banding, does not imply their should be selection on another trait, like colour)

33 If your results are significant despite small sample size, then you should be confident in them (that's what the test is for)

34 You do not decrease the effect of drift by looking at replicates from the same habitat- you increase your ability to distinguish drift from selection.

35 If you wish to show consistency of frequencies WITHIN a habitat, you should test for differences in frequency among populations from the same habitat

36 Use 'fewer' for things you can count (not 'less')

37 Always check if differences in frequency are significant or not, for example if the 3 colours had the following ratios, they are significantly different according to the chi squared test, even though they may look quite similar.

  • Ctab

forest bush

brown 14 15
pink 20 35
yellow 67 48
  • chisq.test(Ctab)

Pearson's Chi-squared test
data: Ctab
X-squared = 7.2209, df = 2, p-value = 0.02704

38 The null hypothesis is not about your results (not 'the [observed] frequencies will be...) but about the actual populations (e.g. the unobserved [we cannot sample every snail] relative frequencies in the population are equal ...)

39 Equal allele frequencies is not evidence of drift

40 Red and white are not recognised colours for Cepaea shells

41 Observed values must be counts (whole numbers!)

42 If you compare 3 categories in 2 habitat types, that is 2 dof

43 Drift produces very similar results to selection.. one difference might be consistency of patterns across replicates

44 You can specify expected results given what previous authors have found, but also the expected patterns that would arise if drift were the major cause of differences among populations.

45 Confidence intervals give an indication of the range of possible parameter values (e.g. the range of likely values of a mean, or in your case e.g. the range of likely values of the frequency of yellow shells). They are not used to indicate a range of p-values.

46 If there is gene flow, it may even-out allele frequencies, but it will not necessarily make them identical, especially if there is strong selection or drift.

47 You cannot really infer much from the total number of snails in any one locality. This number will depend on how well hidden and how well preserved the shells are. You should compare the RELATIVE frequencies using a contingency table, e.g. the Chi squared test detects no significant difference here, whereas the NUMBERS are significantly different